Text Clustring with Fuzzy Measure of Descriptors Weight

نویسندگان

  • Ibtissam El Hassani
  • Moulay Ismail
چکیده

Our work consists in implementing a new two-dimensional descriptor in Text Mining. After the morphosyntaxic analysis of the words using the techniques of automatic treatment of the natural language, there is lost additional information which we will not neglect but rather put in a new dimension. This involves a rewriting of weight descriptors in documents by a new "fuzzy" measure. The application of this approach on an Arabic corpus involved a transformation of text words in a set of pairs (root, pattern) to be descriptors of our corpus. The morphosyntactic analysis gives all possibilities and not a single solution. We apply, then the Hidden Markov model morphosyntaxic post-analysis to detect the most likely based on the context of the word analysis. We show that we are able to achieve higher precision when compared to conventional Vector Space Model representation and Latent Semantic analysis in the context of Arabic Text Clustering. Keywords— Text Mining, Text Processing , Hidden Markov models , Natural language Processing, Fuzzy logic

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Content-based Dynamic Email Spam Detecting Using Fuzzy Granular Computing Approach

Spam detection is a significant problem which is considered by many researchers by various developed strategies. The best and main spam detection technique should consider and scan the content of the messages to find spam. This research concerns the development of the certain category of granular computing as a classifier for spam detection. In this research, Fuzzy Granular Computing Classifica...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

DESIGN AND IMPLEMENTATION OF FUZZY EXPERT SYSTEM FOR REAL ESTATE RECOMMENDATION

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...

متن کامل

Information extraction and imprecise query answering from web documents

Word based searches for relevant information from texts retrieve a huge collection and burden the user with information overload. Ontology based text information retrieval can perform concept-based search and extract only relevant portions of text containing concepts that are present in the query or those that are semantically linked to query concepts. While these systems have better precision ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014